Overview

Dataset statistics

Number of variables5
Number of observations95010
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory3.6 MiB
Average record size in memory40.0 B

Variable types

Numeric3
Text2

Reproduction

Analysis started2024-05-03 15:40:34.407960
Analysis finished2024-05-03 15:48:41.746915
Duration8 minutes and 7.34 seconds
Software versionydata-profiling vv4.7.0
Download configurationconfig.json

Variables

similarity_score
Real number (ℝ)

Distinct90870
Distinct (%)95.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.7785381
Minimum0.739786
Maximum0.87120444
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size742.4 KiB
2024-05-04T00:48:41.869160image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum0.739786
5-th percentile0.75973005
Q10.76874564
median0.7767409
Q30.78595968
95-th percentile0.80438816
Maximum0.87120444
Range0.13141844
Interquartile range (IQR)0.017214041

Descriptive statistics

Standard deviation0.013764874
Coefficient of variation (CV)0.017680411
Kurtosis1.2805986
Mean0.7785381
Median Absolute Deviation (MAD)0.0085044335
Skewness0.87431604
Sum73968.905
Variance0.00018947175
MonotonicityNot monotonic
2024-05-04T00:48:42.077099image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.7564311041 9
 
< 0.1%
0.7608817269 9
 
< 0.1%
0.7711655799 9
 
< 0.1%
0.7679277082 9
 
< 0.1%
0.7671090325 9
 
< 0.1%
0.7670514205 9
 
< 0.1%
0.7658925468 9
 
< 0.1%
0.7658477805 9
 
< 0.1%
0.764914634 9
 
< 0.1%
0.7646556549 9
 
< 0.1%
Other values (90860) 94920
99.9%
ValueCountFrequency (%)
0.7397860001 1
< 0.1%
0.7401986584 1
< 0.1%
0.7405600195 1
< 0.1%
0.7408847744 1
< 0.1%
0.74121344 1
< 0.1%
0.7413294225 1
< 0.1%
0.7414419418 1
< 0.1%
0.7422464185 1
< 0.1%
0.7423244582 1
< 0.1%
0.7423705559 1
< 0.1%
ValueCountFrequency (%)
0.8712044448 2
< 0.1%
0.8695070585 1
< 0.1%
0.8628570823 1
< 0.1%
0.8613146561 1
< 0.1%
0.8602318313 2
< 0.1%
0.8599051395 1
< 0.1%
0.8593465219 1
< 0.1%
0.8578624039 2
< 0.1%
0.8566986522 2
< 0.1%
0.8558077554 1
< 0.1%

skill_id
Real number (ℝ)

Distinct3020
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4061.8227
Minimum1
Maximum8741
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size742.4 KiB
2024-05-04T00:48:42.262319image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile382
Q11930
median3842
Q36193
95-th percentile8347
Maximum8741
Range8740
Interquartile range (IQR)4263

Descriptive statistics

Standard deviation2514.3955
Coefficient of variation (CV)0.61903133
Kurtosis-1.0958914
Mean4061.8227
Median Absolute Deviation (MAD)2006
Skewness0.22047571
Sum3.8591378 × 108
Variance6322184.9
MonotonicityNot monotonic
2024-05-04T00:48:42.394430image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
890 2080
 
2.2%
521 982
 
1.0%
7550 885
 
0.9%
6232 797
 
0.8%
2715 710
 
0.7%
4290 705
 
0.7%
4254 674
 
0.7%
5484 625
 
0.7%
6707 569
 
0.6%
148 556
 
0.6%
Other values (3010) 86427
91.0%
ValueCountFrequency (%)
1 2
 
< 0.1%
3 1
 
< 0.1%
7 5
 
< 0.1%
8 34
< 0.1%
9 8
 
< 0.1%
10 6
 
< 0.1%
20 4
 
< 0.1%
22 1
 
< 0.1%
25 22
< 0.1%
29 2
 
< 0.1%
ValueCountFrequency (%)
8741 1
 
< 0.1%
8737 2
 
< 0.1%
8727 13
 
< 0.1%
8722 50
0.1%
8721 1
 
< 0.1%
8720 74
0.1%
8712 3
 
< 0.1%
8703 41
< 0.1%
8697 96
0.1%
8694 21
 
< 0.1%
Distinct3020
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Memory size742.4 KiB
2024-05-04T00:48:42.681552image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length100
Median length70
Mean length33.714662
Min length3

Characters and Unicode

Total characters3203230
Distinct characters77
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique658 ?
Unique (%)0.7%

Sample

1st row Oracle HRIS
2nd row Human resources management system HRMS
3rd row OrangeHRM
4th row Human resource management software HRMS
5th row Human resource information system HRIS
ValueCountFrequency (%)
software 15680
 
4.1%
management 13005
 
3.4%
system 10181
 
2.6%
manager 9239
 
2.4%
systems 8527
 
2.2%
and 7412
 
1.9%
health 4934
 
1.3%
information 4204
 
1.1%
human 4149
 
1.1%
resources 3043
 
0.8%
Other values (3997) 305517
79.2%
2024-05-04T00:48:43.134474image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
385891
 
12.0%
e 283452
 
8.8%
a 240364
 
7.5%
n 208685
 
6.5%
t 208651
 
6.5%
o 177847
 
5.6%
r 173011
 
5.4%
i 172283
 
5.4%
s 158686
 
5.0%
l 93306
 
2.9%
Other values (67) 1101054
34.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3203230
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
385891
 
12.0%
e 283452
 
8.8%
a 240364
 
7.5%
n 208685
 
6.5%
t 208651
 
6.5%
o 177847
 
5.6%
r 173011
 
5.4%
i 172283
 
5.4%
s 158686
 
5.0%
l 93306
 
2.9%
Other values (67) 1101054
34.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3203230
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
385891
 
12.0%
e 283452
 
8.8%
a 240364
 
7.5%
n 208685
 
6.5%
t 208651
 
6.5%
o 177847
 
5.6%
r 173011
 
5.4%
i 172283
 
5.4%
s 158686
 
5.0%
l 93306
 
2.9%
Other values (67) 1101054
34.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3203230
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
385891
 
12.0%
e 283452
 
8.8%
a 240364
 
7.5%
n 208685
 
6.5%
t 208651
 
6.5%
o 177847
 
5.6%
r 173011
 
5.4%
i 172283
 
5.4%
s 158686
 
5.0%
l 93306
 
2.9%
Other values (67) 1101054
34.4%

job_id
Real number (ℝ)

Distinct3154
Distinct (%)3.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean589158.95
Minimum469953
Maximum616704
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size742.4 KiB
2024-05-04T00:48:43.290065image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum469953
5-th percentile536654
Q1580813
median596149
Q3606955
95-th percentile614951
Maximum616704
Range146751
Interquartile range (IQR)26142

Descriptive statistics

Standard deviation24703.789
Coefficient of variation (CV)0.0419306
Kurtosis2.210657
Mean589158.95
Median Absolute Deviation (MAD)11966
Skewness-1.4938334
Sum5.5975992 × 1010
Variance6.1027717 × 108
MonotonicityNot monotonic
2024-05-04T00:48:43.429948image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
592883 60
 
0.1%
587399 60
 
0.1%
587838 60
 
0.1%
582993 60
 
0.1%
543373 60
 
0.1%
597966 60
 
0.1%
616094 60
 
0.1%
602801 60
 
0.1%
607208 60
 
0.1%
613292 60
 
0.1%
Other values (3144) 94410
99.4%
ValueCountFrequency (%)
469953 30
< 0.1%
470441 30
< 0.1%
470567 30
< 0.1%
472791 30
< 0.1%
473825 30
< 0.1%
479039 30
< 0.1%
481622 30
< 0.1%
482229 30
< 0.1%
483286 30
< 0.1%
483469 30
< 0.1%
ValueCountFrequency (%)
616704 30
< 0.1%
616699 30
< 0.1%
616697 30
< 0.1%
616692 30
< 0.1%
616691 30
< 0.1%
616636 30
< 0.1%
616634 30
< 0.1%
616580 30
< 0.1%
616570 30
< 0.1%
616564 30
< 0.1%
Distinct2179
Distinct (%)2.3%
Missing0
Missing (%)0.0%
Memory size742.4 KiB
2024-05-04T00:48:43.676437image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length147
Median length92
Mean length33.479949
Min length5

Characters and Unicode

Total characters3180930
Distinct characters72
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row Business Operations Analyst
2nd row Business Operations Analyst
3rd row Business Operations Analyst
4th row Business Operations Analyst
5th row Business Operations Analyst
ValueCountFrequency (%)
of 20310
 
5.2%
bureau 13680
 
3.5%
director 9840
 
2.5%
manager 9600
 
2.4%
and 9480
 
2.4%
health 8100
 
2.1%
assistant 7590
 
1.9%
analyst 7320
 
1.9%
specialist 7230
 
1.8%
6120
 
1.6%
Other values (1341) 294390
74.8%
2024-05-04T00:48:44.093689image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
395880
 
12.4%
e 229590
 
7.2%
i 185880
 
5.8%
t 176730
 
5.6%
r 175680
 
5.5%
a 175050
 
5.5%
n 163860
 
5.2%
o 152490
 
4.8%
s 108930
 
3.4%
A 84990
 
2.7%
Other values (62) 1331850
41.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3180930
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
395880
 
12.4%
e 229590
 
7.2%
i 185880
 
5.8%
t 176730
 
5.6%
r 175680
 
5.5%
a 175050
 
5.5%
n 163860
 
5.2%
o 152490
 
4.8%
s 108930
 
3.4%
A 84990
 
2.7%
Other values (62) 1331850
41.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3180930
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
395880
 
12.4%
e 229590
 
7.2%
i 185880
 
5.8%
t 176730
 
5.6%
r 175680
 
5.5%
a 175050
 
5.5%
n 163860
 
5.2%
o 152490
 
4.8%
s 108930
 
3.4%
A 84990
 
2.7%
Other values (62) 1331850
41.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3180930
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
395880
 
12.4%
e 229590
 
7.2%
i 185880
 
5.8%
t 176730
 
5.6%
r 175680
 
5.5%
a 175050
 
5.5%
n 163860
 
5.2%
o 152490
 
4.8%
s 108930
 
3.4%
A 84990
 
2.7%
Other values (62) 1331850
41.9%

Interactions

2024-05-04T00:46:37.281280image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-04T00:40:36.952634image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-04T00:43:57.660837image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-04T00:48:21.931279image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-04T00:40:37.467078image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-04T00:46:02.611229image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-04T00:48:35.679110image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-04T00:42:15.338899image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-04T00:46:26.135908image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Correlations

2024-05-04T00:48:44.213918image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
job_idsimilarity_scoreskill_id
job_id1.000-0.151-0.019
similarity_score-0.1511.0000.055
skill_id-0.0190.0551.000

Missing values

2024-05-04T00:48:41.439490image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-04T00:48:41.578989image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

similarity_scoreskill_idskill_namejob_idjob_title
00.833890153537280355625Oracle HRIS606346Business Operations Analyst
10.824316664020275483603Human resources management system HRMS606346Business Operations Analyst
20.824257822010164735682OrangeHRM606346Business Operations Analyst
30.821808432218902323602Human resource management software HRMS606346Business Operations Analyst
40.818997657314013423601Human resource information system HRIS606346Business Operations Analyst
50.817859529687666625651Oracle PeopleSoft Enterprise Human Resources606346Business Operations Analyst
60.817397942364923025613Oracle E-Business Suite Human Resources Management System606346Business Operations Analyst
70.816551204173008191873Consultants in Data Processing HRnet606346Business Operations Analyst
80.814900898101222574341Lawson Human Resource Management606346Business Operations Analyst
90.81376703790059085654Oracle PeopleSoft Human Capital Management606346Business Operations Analyst
similarity_scoreskill_idskill_namejob_idjob_title
950000.777171357007412917218Softrail AEI Rail & Road Manager561563Deputy Director - Long-Range Planning and Policy
950010.776294809519449852282Digital Crew Teamwork Project Manager561563Deputy Director - Long-Range Planning and Policy
950020.77473332138779965996PlanGraphics Citywide GIS Utility561563Deputy Director - Long-Range Planning and Policy
950030.774422864292507287740Texas Transportation Institute TTI Progression Analysis and Signal System Evaluation Routine PASSER561563Deputy Director - Long-Range Planning and Policy
950040.774407467256093225210Municipal geographic management software561563Deputy Director - Long-Range Planning and Policy
950050.774247928707548993168GEOCOMtms A.Maze Planning561563Deputy Director - Long-Range Planning and Policy
950060.774140057812209746674Route planning software561563Deputy Director - Long-Range Planning and Policy
950070.773738754935716648265Vision Management Consulting IEP PlaNET561563Deputy Director - Long-Range Planning and Policy
950080.77370806262038472261Adaptive Planning561563Deputy Director - Long-Range Planning and Policy
950090.773390790213476567910Total Officer Personnel Management Information System TOPMIS561563Deputy Director - Long-Range Planning and Policy